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Abstract—We consider use of Forward Error Correction (FEC) to 
reduce the in-order delivery delay over packet erasure channels. We 
propose a class of streaming codes that is capacity achieving and 
provides a superior throughput-delay trade-off compared to block codes 
by introducing flexibility in where and when redundancy is placed. 
This flexibility results in significantly lower in-order delay for a given 
throughput for a wide range of network scenarios. Furthermore, a major 
contribution of this paper is the combination of queuing and coding 
theory to analyze the code’s performance. Finally, we present simulation 
and experimental results illustrating the code's benefits. 

1 Introduction 

In this paper we study the in-order delivery delay of 
forward error-correction codes in packet switched networks 
where erasures are due to queue overflow, lossy links, etc. 
(i.e., a packet erasure channel). In-order delivery delay is 
an important metric, often of much interest in applica¬ 
tions where received information packets are buffered until 
they can be delivered in-order. Any packet loss typically 
leads to increases in the in-order delivery delay that can 
adversely affect upper layer performance. For example, 
traditional ARQ takes on the order of a round-trip time, 
or more, to recover from a lost packet. Not only is the 
lost packet delayed, but all subsequent information packets 
must be buffered until the loss is corrected (i.e., "head-of- 
line blocking"). Our interest is in the design and analysis of 
forward error-correction codes that mitigate this problem 
and are specifically tailored so that they achieve low in- 
order delivery delay. 

Traditionally, the primary aim when designing error- 
correction codes is to maximise throughput. The inherent 
fact that there is a trade-off between throughput and delay 
is, of course, recognised in this work {e.g., in the analysis 
of ARQ schemes with delayed feedback 0), but it is very 
much a secondary concern. In contrast, our interest is in 
sacrificing some throughput in order to achieve much lower 
in-order delivery delay. This is motivated by the obser¬ 
vation that bandwidth is relatively abundant in modern 
networks, but delay continues to be a major concern for 
many applications. The use of some some bandwidth to 
lower delay is therefore an appealing proposition. 
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Fig. 1: Example of two codes with different throughput- 
delay characteristics. Shaded squares indicated coded 
packets, unshaded indicate information packets. 


This brings the trade-off between throughput and delay 
to the forefront; and, in particular, it raises questions as to 
whether new codes can be constructed that may achieve a 
more favourable trade-off than traditional codes. Consider, 
for example, a rate ^/n systematic block code where a 
block consists of k information packets followed hy n — k 
coded packets. This is illustrated in Figure [DJa). Suppose 
that the code is an ideal one in the sense that receipt 
of any k of the n packets allows all of the k information 
packets to be reconstructed. Furthermore, assume that the 
first information packet is lost. All remaining information 
packets have to be buffered until the first coded packet 
is received. At this point, the first information packet 
can be reconstructed and all of the information packets 
can be delivered in-order. The in-order delivery delay is 
therefore proportional to k. Alternatively, suppose that 
the n — k coded packets are distributed uniformly among 
the information packets, rather than all being placed after 
the k information packets. To keep the code causal, sup¬ 
pose that each coded packet only protects the preceding 
information packets in the block. This is still a block 
code, but the coded packets and their locations differ from 
the classical setup (see Figure [T](b)). Assume again that 
the first information packet is lost. This loss can now be 
recovered on receipt of the first coded packet resulting in a 
delay that is now proportional to ^/n-k {i.e., this is much 
lower than k when n is large). Of course, the impact on 
throughput due to the fact that each coded packet only 
protects the preceding information packets requires further 
analysis. 

Our main contributions are as follows. While our pro¬ 
posal falls within the class of streaming codes similar to 
Ho et al. [?], the novelty of our code construction comes 
from the flexibility introduced with regard to the location 
where redundancy is placed. We show that these codes 
achieve capacity, yet they provide a superior throughput- 
delay trade-off compared to block codes by achieving 
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significantly lower in-order delay for a given throughput. 
The mathematical analysis of throughput and delay is 
a primary contribution of the paper, requiring the de¬ 
velopment of a number of analytic tools derived from 
both coding and queuing theory. By building on the work 
performed by [7] , our approach allows for a wide variety of 
coding scenarios that allow us to develop a novel interplay 
between the two areas of research. We further demonstrate 
the performance of these low delay codes by evaluating 
them using both simulations and experimental measure¬ 
ments. While the mathematical analysis is confined to 
operation without feedback, we use the simulations and 
experiments to show that our proposed code construction 
can be easily combined with feedback to provide larger 
gains. 

2 Related Work 

2.1 Streaming Codes 

We use the term streaming code to refer to codes which 
do not require a bit/packet stream to be partitioned into 
blocks or generations before coding operations can occur. 
Probably the most common examples are convolutional 
codes. Most work on the performance of convolutional 
codes has focused on their error-correcting capabilities, 
particularly to bursty errors, and only a few have inves¬ 
tigated delay. For example, Martinian et al. [5] and Hehn 
et al. [3] investigate the decoding delay of convolutional 
codes specifically designed for channels with burst errors; 
and Lee et al. m investigate the delay performance of 
orchard codes (a class of convolutional codes). 

Classical convolutional codes do not lend themselves to 
recoding at intermediate nodes within a network. This 
has motivated work on convolutional network codes (pro¬ 
posed in [12] and studied extensively by m, m. [E], 
M)- Delay bounds for convolutional network codes are 
provided by Guo et al. m- The delay of other codes of 
this type, such as those proposed by Joshi et al. m, have 
also shown promising results. However, limitations in the 
models used within this work make it unclear whether 
or not their results can be extended to the scenarios in 
which we are interested. We note that Tomoskozi et al. m 
introduce a non-systematic code similar to our low delay 
code. Experimental results show significant delay gains 
over Reed-Solomon codes, but no analysis is provided. 

2.2 Block Codes 

Block codes require a bit/packet stream to be partitioned 
into generations or blocks, each generation or block be¬ 
ing treated independently from the rest. For example, 
assume that a message of size N information packets Um, 
m = l,...,iV, is partitioned into blocks or generations 
of size k packets. Coded packets Cij, i = 1,..., I’-'v/fe'l ^ 
j = f, 2,..., are then generated separately for each block. 
If the code is systematic, the information packets in each 
block/generation are transmitted first with the coded 
packets transmitted after to help recover from any errors 
or erasures (see FiguredKa)). If the code is not systematic. 


only the coded packets are transmitted. Previous work has 
primarily focused on the decoding delay of non-systematic 
constructions |2D|, [H], [22], [23], [24|, [25], [26], [27], and 
show that the decoding delay is essentially proportional to 
the block size. The in-order delivery delay of systematic 
block codes is lower than that of non-systematic codes but 
remains essentially proportional to the block size (with 
a smaller pre-factor than for non-systematic codes), see 
Cloud et al. [2H] and references therein. 

2.3 Baseline Block Code 

We use the following systematic random linear block code 
as a baseline for performance comparison. This block code 
is similar to that considered in |28j and is constructed 
as follows. We generate n — k coded packets, Cij, i = 
1,..., \^/k \, j = 1,..., n — fc, from each block of k infor¬ 
mation packets, which results in a code of rate Yn. Each 
coded packet is a weighted random linear combination of 
the information packets within its block, i.e., 

ik 

^i,j ^ (1) 

where each information packet Um is treated as a vector in 
an appropriate finite field F of size Q and the coefficients 
are drawn i.i.d uniformly at random from F. It 
should be noted that calculations in field F are always 
carried out over symbols of size Q rather than the packets 
themselves. A systematic code is obtained by first trans¬ 
mitting the k information packets followed by the n — k 
coded packets. 

Maximum likelihood decoding is used i.e. Gaussian 
elimination. Should an erasure occur, any coded packet 
can be used to help reconstruct/decode the missing infor¬ 
mation packet. For sufficiently large field size Q and no 
more than n — k erasures, any combination of k packets 
in each block can be used to recover from the erasures 
with high probability. If more than n — k erasures occurs, 
a decoding failure occurs and some of the information 
packets within the block may be unrecoverable. Otherwise, 
information packets will be recovered with high probability 
when at least k packets have been successfully received. 

This code is asymptotically capacity achieving over 
erasure channels as block size n —>■ oo m- It also provides 
a good baseline for comparison since it is a modern, high 
performance code that has a number of optimality prop¬ 
erties {i.e., it is representative of the best possible block 
code performance). In particular, this code minimises the 
probability of a decoding failure for any given coding rate 
over a large class of block codes in addition to minimizing 
the decoding delay [29]. 

3 Low-Delay Coding Over a Stream 

Our interest is in constructing a code that provides low 
in-order delivery delay while providing protection against 
errors/erasures. We consider a systematic random linear 
streaming code construction that inserts coded packets at 
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Fig. 2: Illustrating the setup considered. Sequence {uj} 
of information packets is interleaved with sequence {ci} 
of coded packets (indicated as shaded) and transmitted. 
Slots correspond to a single packet transmission and are 
indexed 1, 2, • • •. 
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strategic locations within an uncoded/information packet 
stream. 

Assume that time is slotted and each slot is indexed 
as t = 1 , 2 ,.... Within each slot, a single packet can 
be transmitted. The code is constructed by interleaving 
information packets (i.e., uncoded packets) uj, j = 1,2,... 
with coded packets Ci, i = 1 , 2 ,.... For reasons that will 
be explained later, one coded packet is inserted after every 
I — 1 information packets and transmitted over the network 
or channel. This results in a code of rate Figure [2] 
illustrates this code construction. 

Each coded packet is generated by taking random lin¬ 
ear combinations from a span of previously transmitted 
information packets {ul,---,uu}- This span is referred 
to as the coding window. We will assume in the analysis 
that L = 1 and U is the index of the information packet 
immediately proceeding the generated coded packet (i.e., 
U = {I — l)i assuming the generated coded packet is Ci). 
Therefore, coded packet Ci is generated as follows: 

Ci = fi{Ul,U2,.■.,U(^l-l)^) ■■= ^ W^jUj, ( 2 ) 

1=1 

where each packet uj is treated as a vector in Fq and 
each coefficient Wij G Fg is chosen randomly from an i.i.d. 
uniform distribution. 

Before proceeding, it should be noted that in practice 
the lower edge of the coding window can be determined in 
a variety of ways. The only constraint is that the coding 
window must be large enough to ensure intermediate de¬ 
coding opportunities at the receiver. For example, suppose 
that the receiver has received or decoded all information 
packets up to and including packet uj. Feedback can be 
used to communicate this to the transmitter allowing it 
to adjust the coding window so that the lower edge of the 
window is ul = Uj+i for all subsequent coded packets. 
The generator matrix shown in Figure [3] illustrates this 
sliding window approach, where the columns indicate the 
information packets that need to be sent and the rows 
indicate the composition of the packet transmitted at any 
given time. Its encoding/decoding complexity is analysed 
in Section El 

The receiver decodes on-the-fly once enough pack¬ 
ets/degrees of freedom have been received. In more detail, 
the receiver maintains a generator matrix Gt at time t 
which is similar to that shown in Figure [3] except that it is 
composed only of the coefficients obtained from received 
packets. If Gt is full rank, Gaussian elimination is used to 


Fig. 3: Example generator matrix for the low delay 
code with sliding window showing the coefficients used 
to produce each packet. In this example, we assume 
that the transmitter has obtained knowledge from the 
receiver by time 10 indicating that it has successfully 
received/decoded packets ui and U 2 allowing it to adjust 
the lower edge of the coding window to exclude them from 
packet C 2 . 

recover from any packet erasures/errors that may have oc¬ 
curred during transit. More details are provided in Section 
El In our analysis, unless stated otherwise, we will make 
the standing assumption that the field size Q is sufficiently 
large so that with probability one each coded packet helps 
the receiver recover from one information packet erasure. 
Specifically, each coded packet row added to generator ma¬ 
trix Gt increases the rank of Gt by one. This assumption 
is relaxed in Section El where the probability of a decoding 
failure is considered; and our experimental measurements 
do not, of course, make this assumption. 

Unlike the codes described in Section [21 the code de¬ 
scribed here generates coded packets that are (i) indi¬ 
vidually streamed between information packets (rather 
than being transmitted in groups of size k packets) and 
(ii) each coded packet protects all preceding information 
packets (rather than just the information packets within 
its block). Furthermore, for a given code rate it is easy to 
see that this code construction should tend to decrease the 
overall in-order delivery delay at the receiver compared to 
a block code - recall the example in Section [1] However, 
the challenge is to quantify the delay performance of the 
low delay code. We also note that the causal construction 
used will limit the power of the low delay code and so its 
throughput performance also needs to be analysed. 

4 In-Order Delivery Delay and Throughput 

4.1 In-Order Delivery Delay 

Information packets are delivered in-order at the receiver 
until an erasure of an information packet occurs. Upon 
erasure, in-order delivery is paused (arriving packets are 
buffered) until the decoder receives as many coded packets 
as the number of erasures, at which point in-order delivery 
resumes. 

Let {ii} denote the sequence of slot times at which 
erasure of an information packet pauses in-order delivery 
and {Ti} the corresponding sequence of times at which in- 
order delivery resumes. Note that the Ti must be a slot at 












4 


□□□□□□□□ 


ti t. 


T. 


busy period idle period 

Fig. 4: Illustrating notation used. Clear rectangles indi¬ 
cate information packets, shaded rectangles coded packets, 
crosses indicate erasures, coded packets are inserted every 
I = 4 slots, ti is the coded packet slot immediately preced¬ 
ing the information packet erasure at slot U which pauses 
in-order delivery at the receiver, Ti the coded packet 
slot at which in-order delivery resumes. The information 
packet at slot -I- 1 is delivered without delay, but any 
information packets in slots {U, • • • , Ti} are delayed. 

which a coded packet is transmitted. Letting ti = 
be the coded packet slot immediately preceding slot L, 
we can then define the sequence of coded packet slots 
{ti, Ti, ^ 2 , ^ 2 , • • • }. See Figure m for a schematic illustra¬ 
tion. Slots {ti + l,ti + 2, • • • ,Ti} contain information 
packets delayed by the Fth pause, plus perhaps non- 
delayed packets {L -I- 1,L} and this set of slots is referred 
to as the Fth “busy” period. Slots {Ti + 1, • • • , L+i} can be 
partitioned into intervals {T^-l-l, Ti + l}, {Ti+l + 1, Ti + 2l}, 
etc. each of size I slots and ending with a coded packet slot 
(since Ti and ti are both coded packet slots). Each of these 
intervals of I slots is referred to as an “idle” period. 

The busy/idle period terminology is analogous with a 
queueing system operating in embedded time correspond¬ 
ing to the coded packet slots. Information packet erasures 
can be thought of as queue arrivals and reception of coded 
packets as queue service. Pauses in in-order delivery then 
correspond to periods when the queue size is non-zero. 

Index the busy/idle periods hy j = I, 2, • • • and let 
i{j) be the index of the pause corresponding to the j’th 
busy period (i.e., the j’th busy period consists of slots 
“ 1 '^iU) }). With the j’th period we associate a 
random variable Sj, with Sj = 0 for an idle period 
and Sj = {Ti(^j-j — ti(^j))/l for a busy period (i.e., Sj 
equals the number of coded packets transmitted before 
delivery resumes). Since packet erasures are i.i.d., the 
busy/idle periods form a renewal process and the {S'j} 
are i.i.d. Letting S ~ Sj the following theorem completely 
characterises the probability distribution of the busy time 
S and is one of our main results. 

Theorem 1 (Busy Time). In an erasure channel with 
erasure probability e, suppose we insert a coded packet in 
between every I — 1 information packets. Assume that each 
coded packet can help us to recover from one erasure. We 
have: 

I. For all values of e and I such that le < 1, the mean of 
the probability distribution of S exists and is finite. 

II. 

(1 — e)*~^ for s = 0 

(Z - l)e(l - e)'"^ fors = l 

fors>l 

0 otherwise. 
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Fig. 5: Illustrating delay introduced by erasures. In this 
example the information packet at slot ti is delayed by 6 
slots, the information packet at slot L -I- 1 by 5 slots, the 
information packet at L -I- 3 by 3 slots and so on. It can 
be seen that the sum-delay is the area under a triangle 
of base Ti — ti slots and height Ti — ti slots, less the area 
associated with any coded packets. 


III. 


E{S) = 


{l-l)e{l-e) 


i-i 


l-le 


E{S^) =E{S) + 


l{l-l)eHl-ey 

{1-lef 


( 4 ) 

( 5 ) 


Proof: See Appendix. □ 

Observe that the requirement that le < 1 for S to have 
finite mean is a natural one. The rate of the coding scheme 
is R= = 1 — j. Since this rate of transmission should 
be less than the channel capacity, we require R < 1 — e, 
and so le < 1. 

We also emphasise that the in-order delivery delay 
expressions in Theorem [T] are exact (they are not bounds) 
and have an easy to evaluate closed-form (they are not 
combinatorial in nature). This is notably different from 
previous analysis of in-order delivery delay and derives 
from the favourable structure of the low-delay coding 
scheme. 

To continue the analysis, we introduce the random 
variable 5'+ = minj^, 1}. S'"*" helps us to count the 
number of intervals in the communication interval T. It 
is straightforward to compute the probability distribution 
of 5'+ as follows: 


Corollary 1. Let 5'+ = minj^, 1}. We have: 

I. For all values of e and I such that le < 1, the mean of 
the probability distribution of S~^ exists and is finite. 

II. 

f (A -I-1 — e) (1 — e)^~^ for s = 1 

PS+ (s) = < (1 - fors>l (6) 

I 0 otherwise. 


III. 

= (7) 

Combining Theorem [T] and Corollary [T] with the follow¬ 
ing result allows us to obtain a simple closed-form bound 
on the mean in-order delivery delay: 
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Fig. 6: Measured mean in-order delivery delay and upper 
bound versus code rate *-i/i for different i.i.d. packet 
erasure rates e. 


Theorem 2 (In-Order Delivery Delay). At the receiver, 
the asymptotic mean in-order delivery delay for informa¬ 
tion packets is upper bounded by slots. 

Proof: See Appendix. □ 

The comparison of this upper bound for in-order delay 
with simulated results is provided in Fig. [6l It can be seen 
that the upper bound is tight at both low and high coding 
rates. Furthermore, it is reasonably tight at intermediate 
coding rates. Despite its simple form, it is therefore quite 
powerful. 


4.2 In-Order Delivery Delay where Coded Packets are 
Placed in Groups 


Before proceeding with determining the throughput of the 
low delay code, we first provide justification of why we 
chose to only send one code packet at a time to recover 
from losses. Consider the coding scheme where c coded 
packets are transmitted after every Ig — c information 
packets (i.e. coded packets are transmitted in groups of 
c packets). Note that this reduces to the low-delay coding 
scheme when c = 1. 

Following the same steps as before, we define the process 
Sg as the busy period of the decoding operation. While 
extending the proof technique used previously is difficult 
in this case, results from lattice theory can be used to 
prove the following: 

Theorem 3 (Busy Time for Group Coded Packets). In 
an erasure channel with erasure probability e, suppose we 
insert c coded packets in between every Ig — c information 
packets. Assume that each coded packet can help us to 
recover from c erasures. We have: 

I. For all values of e and Ig such that IgC < c, the mean of 
the probability distribution of Sg exists and is finite. 



Fig. 7: Average in order delivery delay per information 
packet, c is the number of coded packets at the end of 
each interval, e = 0.1. 


II. 


PS, (s) = 

r(l-e)'« 


C C—% 


I Ig - c\ I c 


^ ^ J/(e,i + 

i=l j=0 
c-1 

^ NP {Ig, c, s, i) f (e, sc - i, slg) 
i=0 
0 


for s = 0 
for s = 1 

for s > 1 
otherwise. 


( 8 ) 


where f{e,x,y) = {1 — ey ^ and the exact value of 

NP{lg, c, s, i) is computed in the proof of the theorem. 

Proof: See Appendix. □ 

Theorem [3] does not provide a closed-form expression 
of the mean in-order delay when coded packets are sent 
in groups, but it does allow numerical calculation of the 
in-order delivery delay. 

Fig.[7]shows the calculated delay per information packet 
vs c. In order to provide a fair comparison for different 
choices of c the coding rate is held constant, i.e. 1 — 7'ig = 
1 — i/i is held constant as c is varied. It can be seen that 
the delay is an increasing function of c. In other words, 
the delay is minimized when only a single coded packet is 
transmitted at a time (i.e., c = 1). 


4.3 Throughput and Rate 

We can expect that the improved delay performance of 
the low delay coding scheme carries a throughput price. 
However, it turns out that this price is a small one. For a 
stream of N packets, a decoding error may occur since it 
is possible that a burst of errors near the end of the stream 
may not allow for sufficient time to transmit the necessary 
coded packets to recover the lost information packets. 
That is, a number of information packets at the end of a 
transmission may be lost. Define the good throughput GT 
as the ratio of the number of information packets delivered 
to the receiver and the number of packets transmitted by 
the transmitter. The good throughput is a random variable 
and its behavior is characterized in the following theorem: 

Theorem 4 (Throughput). Consider the transmission of 
a coded stream of length N over an erasure channel and 
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that le < 1. For any i?o < 1 — j and (5 > 0, there exist an 
N large enough such that 


neglected for most practical purposes and the simplifying 
assumption in the preceding section is reasonable. 


Pr {GT >Ro)>l-6. (9) 

Proof: See Appendix. □ 

Recall that the rate of transmission of the low delay 
coding scheme is = 1 — j and that the capacity of 
the erasure channel is 1 — e, so the condition k < 1 allows 
all coding rates up to the channel capacity. Theorem 0] 
therefore tells us that the low delay code is asymptotically 
capacity achieving as —>■ oo. 

In other words, the fraction of information packets 
lost can be made arbitrarily small for sufficiently long 
transfers. This is because the length of the possible failure 
at the end of a transmission is independent of the length 
of the transmission. Therefore, the sacrifice in the good 
throughput becomes negligible as the length of the trans¬ 
mission grows. Of course, losing any information packet 
is undesirable. However, an easy extension of the coding 
scheme is to send a small number of additional coded 
packets at the end of a transfer (or alternatively to use 
feedback, see below). This will allow any straggling infor¬ 
mation packet erasures to be recovered without adversely 
affecting throughput (or mean delay) when the connection 
size N is large. 

5 Decoding Failure Probability 

5.1 Computation of Decoding Failure Probability 

Recall that following an erasure at time ii the receiver 
pauses in-order delivery of packets. As packets are re¬ 
ceived, the receiver maintains a generator matrix, with Gt 
denoting this generator matrix at time t > ti. This matrix 
contains Nt < t — p rows where Nt is a random variable 
with value equal to the number of packets received in the 
interval from time U to time t. Reception of information 
packet j adds a row to Gt with element j equal to 1 and 
all other elements equal to zero. Reception of coded packet 
i adds a row with elements 1 through {I — \)i equal to 
coefficients Wij and other elements equal to zero. When 
Gt reaches full rank, decoding succeeds and all information 
packets sent up to time t are recovered. An example of Gt 
is shown in Figure [31 

Until now, we have assumed that every received packet 
increases the rank of Gt- In fact there is a small probability 
that this does not occur since the coding coefficients are 
selected uniformly at random. For example, should the 
coefficients of two packets be the same, it is the same as 
if the received coded packet had been erased resulting in 
the appearance of a larger erasure rate. While obtaining 
an analytic expression for this increase in erasure rate is 
difficult, it can be readily calculated numerically. Table [1] 
shows calculated values of the decoding failure probability 
for a range of field sizes Q and values of code rate with 
parameter I on a channel with erasure rate e = 0.1. 
The table shows that the decoding failure probability is 
extremely small, even for small field sizes such as Q = 2 
(i.e., the binary field). As a result, decoding failures can be 



(3 = 2“= 

X 

2 

3 

4 

5 

6 


5 

10 - 16.71 

10 - 20.2 

~ 0 

0 

~ 0 

1 

6 

10 - 14.32 

^ q - 16.22 

]^ q - 18.68 

10 - 22.74 

~ 0 


7 

10 - 12.81 

10 - 13.31 

10 - 16.21 

^ q - 19.03 

~ 0 


TABLE 1: Decoding failure probability per packet, e = 0.1 
and ~ 0 indicates zero to within numerical precision. 


5.2 Analytic Bound 

While the numerical calculation is significantly less con¬ 
servative, we note that it is also possible to upper bound 
the decoding failure probability. Suppose that k erasures 
occur where the erasure pattern is admissible (as defined 
in Lemma 1^), and decoding is attempted after receiving 
exactly k coded packets. Let Ei, E 2 , ■ ■ ■, Ek denote the 
number of erasures in each Z-interval, and G' G 
be the admissible decoding matrix obtained by removing 
the rows and columns in generator matrix Gki that are 
associated with received information packets so that the 
dimension of the decoding matrix is equal to the number 
of erasures. The number of non-zero elements in each row i 
is equal to X]z=i Furthermore, the number of erasures 
in any k' Z-interval for any k' less than k is strictly greater 
than k' since the decoding does not stop before coded 
packet k. 


Theorem 5 (Decoding failure for S = k). Consider an 
admissible decoding matrix G', and assume that its ele¬ 
ments are drawn identically and independently, uniformly 
at random from a field with size Q. The probability that the 
matrix is full rank is bounded as follows: 


Pr{rank{G') = fc) < H 
PrirankiG') = fc) > ^ (l - 


( 10 ) 

( 11 ) 


Proof: See Appendix. □ 

Note that these upper and lower bounds coincide for the 
cases where k = I and k = 2 since there exists only one 
admissible decoding matrix in each case. 


Theorem 6 (Decoding failure for a stream of length N). 
The decoding failure (DF) probability in a stream of length 
N satisfies 


lim —Pr (DF) < 

Nt—too Nt 







( 12 ) 


1. Reminder: The term admissible is used here to mean that at 
least two erasures should happen during a time t = at least 
three within 21, and so on, so that the decoding process will remain 
activated for the whole of the time t = kl. 
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Proof: See Appendix. 


□ 


Corollary 2 (Decoding Failure Probability). The decod¬ 
ing failure probability satisfies 


lim — Pr(DF) < 

Aft—>00 Nt 


Z(l-e)' 



(1-eo)'-' 


( 1 -e)'-^ 


where cq is the solution to the equation 


eo(l —eo)* ^ 



Q + 1 ) 

: (13) 

e)'-\ 

(14) 


Since the function / (e) = e (1 — e)* ^ is increasing for e < 
i/i, the solution always exists and ep < e. 


As Q becomes larger, the decoding failure probability 
goes to zero. Table [5] gives values for the upper bound in 
Corollary [2] for a channel with erasure probability e = 0.1. 
Comparing with Table [T] it can be seen that the bound is 
not tight. 


TABLE 2: The upper bound on decoding failure probabil¬ 
ity per packet, e = 0.1 



Q = 2®^ 

X 

1 

2 

4 

10 

20 


5 

10-2.24 

10-4.65 

10-9.46 

]^ q -23.45 

10-47.53 

l 

6 

10-2.13 

10-4.54 

10-9.36 

]^ q -23.42 

10-47.51 


7 

10-2.09 

10-4.50 

10-9.31 

]^ q -23.43 

10-47.52 


6 Encoding and Decoding Complexity 

The encoding and decoding complexity is dependent on 
the management of the coding window. As already noted 
in Section [31 to limit the complexity we can use a slid¬ 
ing window approach that keeps track of the decoding 
process at the receiver. This results in a complexity that 
is polynomial in E{S) and is considered in more detail 
below. Alternatively, the encoding and decoding might 
be constrained to the kl preceding information packets. 
This will result in a small probability of decoding failure 
Pr(S' > k), but this will go to zero exponentially as k grows 
(see Theorem |T|). 

The complexity of the sliding window scheme depends 
on the process S. Assuming that S' = /c, Gaussian elimi¬ 
nation can be performed at each step requiring approx¬ 
imately arithmetic operations.Using the elementary 
renewal theorem, we have: 


Proof: The proof is similar to the proof of the theo¬ 
rem |S] where the computation of E{S^) is similar to the 
computation of A(S^) in Theorem |TJ □ 

Note that the number of arithmetic operations per infor¬ 
mation packet becomes large as the code rate approaches 
the capacity. However, the complexity in the low delay 
regime (i.e., the regime where the low delay code should 
be used) results in a complexity that is manageable (see 
Table [3]) . 

TABLE 3: The average number of arithmetic operations 
performed in the decoder per information packet 


X 

l = 

0.5 

0.6 

0.7 

0.8 

0.9 

€ 

0.02 

0.67 

1.93 

7.11 

41.74 

769.58 

0.1 

3.13 

8.87 

32.56 

190.96 

3525 


7 Low Delay Code vs. Block Codes 

This section uses stochastic simulations and experimental 
results to supplement the analysis provided in previous 
sections. While the low delay code construction falls within 
the category of streaming codes, we compare its per¬ 
formance with systematic block codes because of their 
widespread use and so that the block code can be used as a 
baseline for comparison with other streaming codes. Many 
of which also use block codes for purposes of comparison. 
We consider both the case when feedback is not used to 
quickly recover from packet losses (i.e., open-loop) and the 
case when it is used (i.e., closed-loop). 

7.1 Simulation Results 

The block and low delay codes are generated as described 
in Sections |3| and |3| respectively. We assume that the 
alphabet size Q for both codes is large enough so that 
the probability of receiving linearly dependent packets is 
zero. Therefore, a decoding error only occurs when the 
channel/network erases more than n — k packets when a 
block code is used where k is the block size and c = ^/n is 
the code rate. 

Simulations are carried out for i.i.d. erasures with packet 
erasure probability e; and for correlated packet erasures 
described by a two state Markov chain. The Markov chain 
has a “good” state (i.e., state G) with packet erasure rate 
e = 0, a “bad" state (i.e., state B) with packet erasure rate 
e = 1 , and the transition probability matrix 


Theorem 7. The total number of arithmetic operation Ca 
in a stream of length Nt which consists of Ni = 
information packets, satisfies the following: 


lim —Cd 
N, 


3 1-le 
2(Z-l)(l-e)' 


E (S^) , 


(15) 


where 


E{S^) 


l{l-l) e 2 (1 - ef (2 - 2 e - 2le^ + le + Pe^) 

( 1 -^e)" 

+ E {S^). (16) 


Pgc 


1-7 7 

P I-P 


(17) 


Two parameters are used to generate the transition prob¬ 
abilities P and 7 : the steady-state probability of state B, 
ttb = '^/■y+F, and the expected burst length, E (L) = 
Within the figures presented in this section, we will refer to 
the i.i.d. model by referencing either erasure rate e or the 
2 -tuple ['Kb,E{L) = 1 ) (note that ttb equals the erasure 
rate). When the correlated loss model is assumed, the 2- 
tuple {' Kb , E { L ) > 1) will be used. 





































(a) I.I.D. Packet Losses with e = 0.05 



(b) Correlated Packet Losses with irg = 0.1 


Fig. 8: Open-loop mean in-order delivery delay, E{D), 
versus code rate for a systematic block code and the low 
delay code {E{L) is the expected packet erasure burst 
duration). 

We first consider the open-loop case where feedback 
communicating the successful reception of a packet is 
unavailable. In this case there is always non-zero probabil¬ 
ity of decoding failure for the block code, corresponding 
to the event that the number of packet losses over a 
block exceeds the number of codes packets in the block. 
Therefore, there remains a non-zero packet erasure rate 
(PER) after coding is applied. For the low-delay block 
code this probability is close to zero for sufficiently large 
connections (see Section [S]). Specifically, the PER of the 
low delay code is essentially zero. 

A comparison of the two codes is shown in Figure [5Ka) 
where the mean in-order delivery delay E{D) is plotted as 
a function of the coding rate for ttb = 0.05 and ttb = 0.1 
when E{L) = 1 (i.e., the packet erasures are i.i.d.). A 
similar comparison is shown in Figure [5Kb) for correlated 
packets losses where E{L) > 1. Each solid line shows the 
delay of the block code vs the coding rate when the block 
size is adjusted to hold the packet erasure rate constant 
(as the coding rate increases the block size must also grow 
to achieve the same PER). The mean in-order delivery 
delay for the low delay code is shown as a dotted line. 

It can be seen that the low delay code achieves a smaller 
in-order delivery delay than block codes for the cases 



(a) e = 0.05 



(b) e = 0.1 


Fig. 9: Mean in-order delivery delay, E{D), versus code 
rate on a 25 Mbps link with an RTT of 60 ms. Both 
a systematic block code and the low delay code are 
shown where feedback is used to signal retransmissions 
and packet erasures are correlated {E{L) is the expected 
packet erasure burst duration). The non-uniqueness in the 
abscissa for the block code when E{L) = 4 occurs when the 
probability of decoding each generation or block without 
needing to retransmit additional degrees of freedom is very 
low. 

that are of the most interest. The reduction in delay is 
substantial, being on the order of a magnitude or more 
for any given code rate. The regime where this is not the 
case is when the rate of decoding failures for the block 
codes is large (e.g. PER > 10“^ when compared to a 
channel packet erasure rate of 0.05) and the coding rate 
is small (e.g., c < 0.8). Recall that the low-delay code has 
a PER K. 0, so the delay comparison is not really fair in 
within this regime. Also note that the block code in this 
regime typically has a block size of only 1 to 3 packets, 
which is far smaller than is usual for block codes. As both 
the code rate and block size are increased, quantization 
due to these small block sizes results in the fluctuations 
shown in the delay-rate curves within Figure [51 

It can also be seen from FigurejlKb) that the block code’s 
in-order delivery delay is much more sensitive to correlated 
losses than the low delay code’s delay. This is a result of the 
low-delay code removing the requirement to partition the 
packet stream into blocks or generations prior to coding. 

Simulation results for the closed-loop case are shown in 
Figure [51 The major difference between this and Figure |5| 
is that feedback is used to help communicate the receiver’s 
need for additional degrees of freedom. When considering 
the block code, feedback is used to initiate retransmissions 
in the form of coded packets if a block cannot be decoded. 
These retransmissions occur until every block can be 
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Fig. 10: Testbed setup. 

decoded and delivered. When considering the low delay 
code, feedback is used to adjust the code rate to ensure 
frequent decoding opportunities. 

While the gain in delay is not as pronounced as the 
open-loop case, the low delay code achieves a lower in- 
order delivery delay than the block code over the entire 
range of code rates (measured as the total number of 
information packets divided by the total number of both 
transmitted information and coded packets). Furthermore, 
the figure highlights the inability of block codes to re¬ 
cover from correlated losses. This is shown by the non¬ 
uniqueness in the abscissa. Each curve is generated by 
increasing the forward error correction (FEC) code rate. 
For smaller FEC rates where a decoding error occurs, 
retransmissions are necessary resulting in larger delays. 
As the FEC is increased, the probability of requiring 
retransmissions to decode a block decreases resulting in 
lower delay. 

7.2 Experimental Results 

We implemented both the low delay code and the sys¬ 
tematic random linear block code within the coded TCP 
transport protocol, CTCP [S]. Use of error-correction cod¬ 
ing at the transport layer to reduce delay was one of the 
original motivations for the present work and is currently 
the subject of much interest, (e.g., as part of the Google 
QUIC protocol). Since delayed ACK feedback is available 
at the transport layer, the setup is similar to the closed- 
loop case above. The congestion control used in CTCP 
was disabled by fixing the congestion window (cwnd) to 
be equal to the path bandwidth-delay product (BDP) in 
each of the experiments in order to focus on the coding 
performance. 

The testbed used consists of three commodity servers 
(Dell Poweredge 850, 3GHz Xeon, Intel 82571EB Gigabit 
NIG) connected via a router and gigabit switches (Figure 
[ini). Both the server and client machines ran a Linux 
2.6.32.27 kernel, while the router ran a FreeBSD 4.II 
kernel, ipf w-dummynet was used on the router to configure 
various propagation delays T, packet loss rates p, queue 
sizes Q and link rates B. As indicated in Figure fTOl packet 
losses in dummynet occur before the rate constraint, not 
after, so that the bottleneck link capacity B is not reduced. 
Finally, data traffic is generated using HTTP traffic with 
apache2 (version 2.2.8) and wget (version 1.10.2). 

Figure [TT] compares the mean in-order packet delivery 
delay, E{D), for the low delay code and the systematic 
block code with different block sizes using the same coding 
rate. Observe that there is an “optimal” block size where 
the block code achieves the lowest delay (a behavior also 
highlighted in [lHl)j and the low delay code achieves a 


a 




la 



0 50 100 150 200 250 300 

Block Size (Packets) 

Fig. 11: Mean in-order packet delivery delay, E{D), versus 
the block size for a random linear block code and low 
delay code. A link rate of 25 Mbps, RTT of 60 ms, packet 
erasure rate of 10%, redundancy of 15%, and a cwnd fixed 
at the BDP (125 packets) was used. 



(a) Block Code (k = 64) (b) Low Delay Code 


Fig. 12: Time history snapshots of the in-order delivery 
delay, E{D), for a random linear code with fc = 64 packets 
and the low delay code. A link rate of 25 Mbps, RTT of 
60 ms, packet erasure rate of 10%, redundancy of 15%, 
and a cwnd fixed at the BDP was used. 

mean delay that is about half of of this value for the link 
used in the experiment. Assuming that the block size is not 
tuned with respect to the path characteristics, the delay 
improvement can be significantly larger. Furthermore, the 
two time history snapshots shown in Figure [12] also verify 
that the delay due to head-of-line blocking is reduced. 

Finally, the mean in-order delivery delay versus the 
reciprocal of the code rate is shown in Figure [131 Mea¬ 
surements are shown for the block code over a range of 
block sizes, in addition to measurements taken using the 
low delay code. The initial code rate for both the block 
and low delay code was set to be Since feedback 

is used, there are no decoding failures for either type of 
code. As expected based on m and Figure [m the block 
code’s in-order delivery delay is a function of the coding 
rate and block size. For the initial coding rate used in 
the experiment, the minimum delay occurs for block sizes 
between k = 32 and k = 64. 

Within our experiments, the low delay code achieves 
a smaller delay over the entire range of coding rates. 
Specifically, the in-order delay is always smaller when 
using the low-delay code for a given throughput. Recall 
that the block code used for comparison is a modern, high 
performance code, and we expect similar behaviours when 
other types of block codes are used. While not shown here, 
similar results were obtained for a wide range of network 
bandwidths and RTTs. 
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Fig. 13: Mean in-order packet delivery delay, E{D), versus 
the number of packets transmitted divided by the number 
of information packets for a systematic random linear 
block code and low delay code. A link rate of 25 Mbps, 
RTT of 60 ms, loss rate of 10%, and a cwnd fixed at the 
BDP was used. 

8 Conclusions 

We introduced a class of streaming codes for packet 
erasure channels that is capacity achieving and provides 
a superior throughput-delay trade-off compared to block 
codes. By introducing the flexibility of where and when 
to place redundancy, our code construction achieves sig¬ 
nificantly lower in-order delay for a given throughput. 
Furthermore, the mathematical analysis of throughput 
and delay is a primary contribution of the paper. This 
analysis required the development of a number of novel 
analytic tools based on both queuing and coding theory. 
Finally, simulation and experimental results were provided 
showing the performance of our code construction. While 
the mathematical analysis is confined to operation with¬ 
out feedback, we showed that feedback can naturally be 
combined with low delay codes in both the simulation and 
experimental results. 

While this paper focused on many of the analytical 
aspects of low delay codes, future work is still required. 
The decoding failure probability discussed in Section^can 
be further reduced by treating our code like a standard 
rateless code when terminating a packet stream. The 
interaction between congestion control and the low delay 
code, as well as its performance when TCP is used, is also 
of interest. Finally, we believe, based on work done by 
[SH, that this code construction can help improve network 
performance by removing or replacing complex lower layer 
reliability schemes such as hybrid ARQ (H-ARQ). 
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Appendix I: Proof of Theorem [T] 

The following lemma corresponds to a result of Tanner [7] 
for the busy time distribution of a G/D/1 queue. 

Lemma 8 (Tannerbl). Consider a decoding procedure at 
time t = 0 with r erasures already occurred in t < 0 . 
Suppose that the decoder stops decoding at time t = Ik 
when it receives the coded packet S = k. The followings are 
necessary and sufficient conditions for a decoding period 
to contain just k decoding packets given that the decoding 
process starts with r erasures in the previous interval: 

(i) precisely k — r erasures happen in the period of Ik; 

(ii) the patterns of these erasures shall be admissible. The 
term admissible is used here to mean that at least one 
erasure should happen during a time t = {r + 1)1, at 
least two within {r+2)l, and so on, so that the decoding 
process will remain activated for the whole of the time 
t = kl. 

The probability of (i) is 

B{k-r-e,kl)= (18) 

and the probability of any pattern of the k — r erasures being 
admissible is ’'/fc. 

Lemma 9. 



Proof: We use the following identities that follow from 
Vandermonde’s identity: 


The Q-function is the tail probability of the standard 
normal distribution, Q{x) = exp du. In 

the last part we used the Chernov bound and the Normal 
distribution {ekl,kle{l — e)) as an approximation of the 
binomial distribution for large values of k. This bound 
only holds if le < 1. We know E{S) = ^ ^)- 

Putting these two together we can deduce that E{S) is 
finite if < 1 . □ 

Proof of Theorem[J^ Part II: The goal is to determine 
P{S = k). We know that P{S = 0) if there is no erasure 
in the first /-interval or only the coding packet is erased, 
so P{S = 0) = (1 - e)' -f e(l - = (1 - We also 

know that P{S = 1) if one and only one of the information 
packets is lost in the first /-interval, so P{S = 1) = (/ — 
l)e(l—We know that for /c > 1 we need at least r = 2 
erasures in the first /-interval which has the probability of 
P{B{e,l) = r) = (*)e’'(l — e)‘~^ for 2 < r < /. Starting 
from the second decoding /-interval, there are r—1 erasures 
to be taken care of. The reason is that either the first coded 
packet is erased, which means r—1 information packets 
are erased, or the first coded packet is not erased and it 
will eventually help us to decode one of the r erasures. 
Using Lemma |H1 knowing that there has been r erasures 
in the first /-interval, we have the following for fc > 1 , 
P[S = k\r) = ^^f^P{B{e,l{k — 1)) = k — r). This means 
that we can allow k — r erasures in the l{k — l)-interval. 
Putting these two together we have 

min(fc,Z) 

P{S^k)= Y, P{B{t,l)=r)^-Y^P{B{t,l{k-l)) = k-r) 

r=2 


min(fc,Z) 

E 


a(fc,Z) 


l\ f{k-l)l 
k — r 


ik - 1)/ 
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- I 


E 

r=2 

min(fc,Z) 
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r=2 

-(/-i) 


l- / (fc - 1)/ 


r — l/V k — r 


I - / (fc - 1)/ 
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(fc - 1)/ 

fc- 1 


kl - 1 
fc- 1 

kl - 1 
fc 


(fc - 1)/ 

fc- 1 

(fc - 1)/ 
fc 


?') 

min(fe,Z) 

- E 

r = 2 

(20) 

min(A:,Z) 

- E 

(21) 

r = 2 


( 22 ) 


Pil-ef 


l — rV 1 / /(fc 1 k — r 


fc — 1 \ k — r 
r-1 fl\ //(fc- 1) 


"( 1 -e) 


k{l-l)-l+r 


^ c'=(l-e) 


k — r 

min(fc,Z) 


^fe(i-^)fe(i-i) 


k(l-l) 


k-1 


E 


i\ / /(fc - 1 ) 
k — r 


Using Lemma El we have 

(l-^)Mi-i)Afc-l)A 


PiS = k) = ^Y 


Hence, - 1)0 

HOI.1221 


_ I T ^ min { k , l ) 

~ 2 2^r—2 


(Vl 


[(1-t) + 

□ 


Proof of Theorem [H Part I: We know that for the 
decoding process to go beyond k we need at least k 
erasures in the first kl interval. This means that there are 
cases with more than k erasures in the first kl interval 
that the decoding process stops before k but for it to be 
greater than k we should have at least k erasures. Suppose 
Eki denotes the number of erasures in the interval kl. We 


have P{S > k\Eki < fc) = 0. Formally speaking: P{S > 
fc) = P{Eki > k)P{S > k\Eu > fc) + P{Em < k)P{S > 
k\Eu < fc) = P{Eki > k)P{S > k\Eki > fc) < P{Eki > 

fc) = P{B{e,kl) > fc) < 


_ 1 fc/i _ —1) ((fc !)/)■ 

“fc-1 ^ ’ k\{(k - 1)1 - k)\ 

_ ((fc — 1)/ — fc -f 1 k/.. _ \k(l-l) _ ((fc ~ !)/)■ _ 

(fc-i)fc ^ {k - i)]{{k -1)1 - k + ly. 

^ (fc — !)(/ — 1) kf, _ xfe(i-i) f{k — l)l\ 

(fc-l)fc ^ ’ \ k-1 J 

_/ 1 ( (i' i)/'\ 

^ fc ^ 1 y 

□ 

Proof of Theorem [II Part III: We know that for 
le < 1 the mean of probability distribution of S exists, 
we have J2k = k) = 1, which means ~ 

_ 2 ^ _ ^ 2 ^ _ By taking derivatives 
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with respect to e, we have (1 — e)* ^ ^(1 ~ 

^)fc(Z-l)((fe-l)i) _ 

iSj (f-i}e ~ = (1 ~ e)*“^ and by re-arranging, we 

have E{S) = —■ By taking derivatives with 

respect to e, we have ^ = ^E{S) which 

means E{S^) = = E{S) + 

lil-l)e^(l-e)‘ „ 

(l-ie)3 • LJ 


Appendix II: Proof of Theorem [2] 

Proof: Consider a transmission of stream with length 
Nt a multiple of I at time t. This translates to ^ 1- 
intervals in time t and assume that the decoding process is 
consisted of Z-intervals of length si, S2 ,..., s„,.... Remem¬ 
ber that S'!, S'2 ,..., is sequence of positive indepen¬ 
dent identically distributed random variables. Consider 
the S~^ process, and define Jn as Jn = Y^=i ^ ^ ^ 

0 and the renewal interval [Jn,Jn+i] is a decoding l- 
interval. Then the random variable {Nt)t>o given by Xt = 
Jn<t} = {n- : Jn <t} (where I is the indicator 

function) represents the number of decoding ^-intervals 
that have occurred by time t, and is a renewal process. 

Let VLi, W2 ,... be a sequence of i.i.d. random variables 
denoting the sum of in order delivery delay in each de¬ 
coding Z-interval. We have two cases to consider. Case (i): 
suppose the j’th period is an idle period. Then Sj = 0 
and the information packets are delivered in-order with 
no delay. Case (ii): suppose the j’th period is a busy 
period and the information packet erasure that initiated 
the busy period started in the first slot -I- 1. Then 
the first information packet is delayed by Sjl slots, the 
second by — 1 slots and so on. The sum-delay over all 
of the information packets in the busy period is therefore 
, _ sr^Sj-l Sp{i-i) 

2^k=l 2^k=0 ^ 2 

The random variable Yf = renewal-reward 

process and its expectation is the sum of in order delivery 
delay over the time-span of t. Based on the elementary 
renewal theorem for renewal-reward processes, we have 
limt_,.oo 7]E[yt] = §77^. Based on the construction of Wi, 

we have E[lTi] = E“iE[Wi|S'+ = i] Pr(5'+ = i) = 
E“o'E[Wi|5'i = i]Pr(S'i = i) = ^^ElS^]. We have 
limt_).oo yEiht] < (iZe)i ^*^^2 Since we assumed that 

Nt = tl, we have: limAr^-^oo E[S'^] 

□ 


Appendix III: Proof of Theorem [3] 

Unfortunately, we cannot extend the Tanner’s result to 
this case in which is equivalent of computing the busy time 
of a GjDjc queue. Instead we use the following lemma : 

Lemma 10 (|33jl. Consider the strictly increasing and 
integer infinite sequence a = (oi, 02 ,..., a„,...). The 
number J\f{sL, n) of sequences of non-negative integers 
( 61 ,..., ...) dominated by the sequence ( 01,02 — 

oi,..., On — On-i,. ■.) for o given n so that 


bi < oi 

fei -f &2 < 02 


Z>1 + &2 -f • ■ ■ -f Z)n < On 


is equal to the determinant of the following matrix: 


j 

1 

0 



0 

\ 

—1 + 1 

) 

1 

0 


0 



) 


1 


0 


K) 

(a2 + l\ 

(a2 + l\ 


(o.2 + l\ 

1 


\n-2) 

\n-3j 


1 1 ) 


V 

(ai+i\ 

(ai + l\ 




J 

V n-1 j 

\n-2) 


V 2 j 

1 1 j 

which can 

be computed 

recursively 

as 

follows 



Af{a 

II 

T 

Xy-i 

V j 

^A/’(a, n 

-j)- 



i=i 


Proof of Theorem O Part I: We know that for the 
decoding process to go beyond k we need at least ck 
erasures in the first klg interval. This means that there are 
cases with more than kc erasures in the first klg interval 
that the decoding process stops before k but for it to be 
greater than k we should have at least ck erasures. Suppose 
Ekig denotes the number of erasures in the interval klg. 
We have P{S > k\gEkig < k) = 0. Formally speaking 
P{S > k) = P{Ekig > ck)P{S > k\gEMg > ck) + 
P{Em, < ck)P{S > k\Eki^ < ck) = P{Eki^ > ck)PiS > 
k\gEki^ > ck) < P{Eki^ > ck) = P{B{e,klg) > ck) < 

QiVk ) < ie . This bound only holds if 

IgC < c. We know E{S) = J2k^i^ > k). Putting these 
two together we can deduce that E{S) is finite if IgC < c. 

□ 

Proof of Theorem\^ Part II: The goal is to determine 
P{S = k). We know that P{S = 0) if there is none of 
the information packets in the first Zg-interval is erased 
P(S = 0) = (1 - We also know that P{S = 1) 

if at least one information packet has been lost and the 
total number of erasures in the first Ig interval is at most 

C, so P{S = 1) = E-=1 E-Zl CT) We 

know that for S' = Zc > 1, we need the number of erasures 
in the klg interval to be between kc — c -I 1 and kc. We 
also need at least c -I- 1 erasures in the first Zg-interval, at 
least 2c -I- 1 erasures in the first 2Zg-interval and so on, so 
that the decoding process remains activated. 

Suppose that S = k and consequently the number of 
erasures in the klg interval is kc — p for a given p, 0 < p < 
c — 1. In the remainder of this part of the proof, the goal 
is to find the number NP{lg,c,k,p) of erasure patterns 
with kc — p erasures so that the decoding process remains 
activated till the reception of coded packets in the fc-th 
Zg-interval. 

We formulate an erasure pattern by a {0,1} sequence x 
with the length of klg. We set Xi = \ \i an erasure occurs 
at time i. We shall describe the positions of the I’s in x by 
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the sequence u = {ui,U2, ■ ■ •, Ukc-p), where the i’s erasure 
is found in position Ui in the sequence x and denote by 
/I = (/ii, ^2, • ■ ■, M/cc-p) the sequence of differences = 
Ui — Ui-i (with fii = ui). Assuming that there exist kc — p 
erasures and the decoding process is activated till the fcth 
Ig interval, we know that there should exist c + 1 erasures 
in the first Ig interval. Hence for an arbitrary admissible 
erasure pattern x, the first erasure can not occur later than 
the time Ig — c, so we have pi < Ig — c. Following the same 
argument, p, should satisfy the following inequalities: 

fll < Ig — c 
+ /^2 ^ — C + 1 


Ml + 1^2 + ■ ■ ■ + Mc+1 < Ig 

Ml + M2 + ■ ■ ■ + Mc+1 + Mc+2 ^ — C + 1 
Ml + M2 + ■ ■ ■ + Mc+2 + Mc+3 ^ — c + 2 


Ml + M2 + ■ ■ ■ + M2c + M2c+l < 

Ml + M2 + ■ • • + M(fe-2)c+l + M(fe-2)c+2 < (fc — k)lg — c + 1 
Ml + M2 + ■ ■ ■ + M(fe-2)c+2 + M(fe-2)c+3 < “ 1)^9 — C + 2 


Ml + M2 + ■ ■ ■ + M(fe-2)c+c + Mfec-c+1 < (fc — 1)^9 


Ml + M2 + ■ ■ ■ + Mfec-c+1 + Pkc-c+2 < klg — C + P + 1 
Ml + M2 + ■ ■ ■ + Mfec —c+2 + Pkc—c+3 ^ klg — c + p + 2 


Ml ^ M2 ^ ^ Pkc—p—1 “t Pkc—p — klg 

Using Lemma 1101 we can enumerate such admissible era¬ 
sure patterns for given values of k,lg,c,p. □ 


Appendix IV: Proof of Theorem |4] 

Proof of Theorem ; Assume that the 
decoding process consists of intervals of lengths 
■si, S2, S3 ,..., Sfe, Sfe+i,..., and without the loss of 
generality, assume that 

k ^ k+i 

= o}j<fe-i- 'y ^ Sj < — < ~ o}j<fe+i + y ^ Sj. (23) 

9=1 9=1 


Note that if 5” = 0, then no erasure has occurred in the 
(I — l)-interval and all information packets are received 
without delay. The throughput of this scheme is GT = 
{I - l)(#{s, = 0},<fc + Y.Usg)/N = [I- l)(#{s, = 
0}j<fe+i + Sj - Sfe+I + I(sfe+1 = 0))/N, where 

X(sfe+i = 0) is the indicator function and is equal to one if 
s/c+i = 0 and otherwise it is zero. From (ESD, we know 
From theorem [TJ we know the probability distribution 
of Sk+i, so Pr (GT > ^ - = Pr(^fc+i < /). In 

particular, using the Chernov bound for Pr(S' > /), we 


have Pr (GT > ^ - 
0 = 


N 


) > 1 - ie . Setting 




Ig f ^ TffQ can solve for fs. Assume that Rq = 


^-7, for any N > we have Pr(Gr > Rq) > 1-,^. 

□ 

Appendix V: Proof of Theorems [5l [6] and 
Corollary [5] 

Proof of Theorem 0 ; 

Consider an admissible decoding matrix Gpxk for a 
given erasure pattern Ei, E 2 , ■ ■ ■, Ek- Looking at this ma¬ 
trix from a column perspective, the number of zeros in each 
column i, denoted by Zi is equal to the number of rows j 
in which is strictly less than i. Based on this con¬ 

struction, the sequence Zi is increasing. Let {vi,... ,Vk} 
be the columns of the decoding matrix G. The number of 
zeros Zi in each vector can be computed based on the 
erasure pattern. The vector is a A:-dimensional vector 
and its first Zi element is forced to be zero and the other 
k — Zi are drawn identically and independently from a 
uniform distribution drawn from an alphabet with size Q. 

Our argument in this part of the proof is based on 
the Landsburg’s combinatorial argument in |34j . We now, 
compute the probability that k columns {mi, ..., izfc} of G 
are linearly independent. Starting from the last vector Mfe, 
we know that it should be non-zero which happens with 
probability = 1 - ■ The vector i^k-i must 

lie outside of the one-dimensional subspace containing 0 
and the first vector Vk- This happens with probability 
of ^ Qk-zl~‘^ = 1 - Qk -%-1 ■ The j-th vector must 
lie outside the (k — i) -dimensional subspace spanned by 
the last k — i vectors {j/k, Vk-i, • ■ •, Vk-i+i\- This happens 
with probability of ^ — = 1 — gk-Zt ■ Putting all 

these together, the probability that the decoding matrix 
Gfcxfe is of full rank is given by Pr(rank(G) = k) = 

Considering different erasure patterns, Pr(rank(G) = k) 
is maximized if all Z^s are zero. This means that all 
the erasures happen in the first ^interval, i.e. Ei = 
k,E 2 = 0^. ■., Ek = 0 and Zi = Z2 = • • • = = 0. 

In this case the probability of the matrix G to be full 
rank is given by Pr(rank(G) = k) = n^=i(l ~ ^gk )■ 
Pr(rank(G) = k) is minimized if all Z^s are maximized. 
Since Zi is an increasing sequence, in order to maximize 
Zi we have to also maximize Zi+i, Zi_|_2 ,.... The admis¬ 
sible erasure pattern that maximizes the sequence Zi is 
El = 2,^2 = 1,^3 = 1,..., Ek -1 = l,Ek = 0, and 
Zi = 0, Z2 = 0, Z3 = 1, Z4 = 2,..., Zfc = A: — 2. This is 
justified because if any of the erasures happens sooner the 
number of rows j in which Ej is strictly less than some 
value of i becomes smaller, contradicting the assumption 
that the sequence Zi is maximized. On the other hand 
if any of the erasures occurs in a later Z-interval, the 
erasure pattern will not be admissible. In this case the 
the probability of the matrix G to be full rank is ^iven by 
Pr(rank(G) = k) = HLi 11^3(1 " Qk-^.- 2 ) ) = 

Proof of Theorem 0' Consider a transmission 
of stream with length Nt a multiple of I at time 
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t. This translates to ^ 1-intervals in time t and 
assume that the decoding process is consisted of l- 
intervals of length si, S 2 , ■ ■ ■, Sn, ■ ■ ■ ■ Remember that 
S'!, 52 ,..., S'n is sequence of positive independent 
identically distributed random variables. Consider the 
5+ process and define for each n > 0 Jn = 
and the renewal interval [Jn,Jn+i] is a decoding l- 
interval. Then the random variable {Nt)t>o given by 
= snp {n : Jn <t} (where I is the 
indicator function) represents the number of decoding 
^-intervals that have occurred by time t, and is a renewal 
process. Let Wi,W 2 ,... be a sequence of i.i.d. random 
variables denoting the occurrence of the event of decoding 
failure in each decoding /-interval. We define Wi to be 
I(Decoding failure occurring in the i-th decoding /-interval ), 
where I is the indicator function. Note that 
E(lTi|5)’' = k) = Pr(tTi = = k). Then the random 

variable Yt = ^ renewal-reward process and 

its expectation is the decoding failure probability over the 
time-span of t. Based on the elementary renewal theorem 
for renewal-reward processes m, we have: 


Imi = 

t—^OO t 


E(Wi) 

E(5+)- 


(24) 


Based on the construction of Wi, we have 


^Wi] = ^E[lTi|5'i+ = i] Pr(5i+ = i) 

i=l 

00 

= ^E[Wi|5i =i] Pr(Si =i) 

i=0 

00 

(25) 

i=l 

The last step follows from the fact that the decoding 
failure probability is zero if 5i = 0 and for all 5i > 1 
the decoding failure probability is upper bounded by 
(1 — ^)*^ following the theorem O Putting together equa¬ 
tions (EU,® and Corollary[Tl we have limt_>oo < 

- ^(1 - = i) Since we 

assumed that Nt = tl, we have: limArj_>oo 7^]E[Tt] < 

A(l-^)‘)Pr(Si = i). ’ □ 

Proof of Corollary Consider the function /(e) = 
e(l — e)*“^ and its derivative /'(e) = (1 — /e)(l — e)'“^. 
/'(e) is positive for e < j, which means /(e) is an 
increasing function. This guarantees that eo the solution 
to the equation eo(l — eo)^“^ = (1 — ^)e(l — e)*“^ 
exists and cq < e < j. From theorem [H we 
have = 1 - (1 - and 

Er=iT/'^(^o)((tV0 = 1 - (1 - eo)'-h At 

this point, setting a = (l-iy and following the 
theorem we can compute the decoding failure 
probability: lim^Vj-^oo Pr(DF) < - ^(1 - 

^)fc) p,^s = k)=a EZi Pr is = k)- Er=i (1 - 
^)fcpr(5 = fc) = " 

a^EZi = a(l-(l-e)'-i-^(l- 

(1 - eo)'-i)) = a(^(l - eo)'-i - (1 - + ^). 


□ 







